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Abstract 

Computational prediction of cancer associated SNPs from the large pool of SNP dataset is now being used as a tool for 
detecting the probable oncogenes, which are further examined in the wet lab experiments. The lack in prediction accuracy 
has been a major hurdle in relying on the computational results obtained by implementing multiple tools, platforms and 
algorithms for cancer associated SNP prediction. Our result obtained from the initial computational compilations suggests 
the strong chance of Aurora-A G325W mutation (rsl 15391 96) to cause hepatocellular carcinoma. The implementation of 
molecular dynamics simulation (MDS) approaches has significantly aided in raising the prediction accuracy of these results, 
but measuring the difference in the convergence time of mutant protein structures has been a challenging task while 
setting the simulation timescale. The convergence time of most of the protein structures may vary from 10 ns to 100 ns or 
more, depending upon its size. Thus, in this work we have implemented 200 ns of MDS to aid the final results obtained from 
computational SNP prediction technique. The MDS results have significantly explained the atomic alteration related with 
the mutant protein and are useful in elaborating the change in structural conformations coupled with the computationally 
predicted cancer associated mutation. With further advancements in the computational techniques, it will become much 
easier to predict such mutations with higher accuracy level. 
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Introduction 

Most genetic variations in human were portrayed by single 
nucleotide polymorphisms (SNPs), and most of them were 
thought to cause phenotypic differences amid individuals. Due 
to the use of high-throughput sequencing methods, the amount 
of identified variants in the human genome was rising rapidly, 
but studying their phenotypic outcome was a laborious task. 
Moreover, processing of these vast amounts of SNPs needs the 
improvement of regular annotation tools. Detecting patholog- 
ical genetic variants were helpful in the organization of 
genome level study, establishing the hope on the way of target 
based therapies and tailored medicine. The ability to differ- 
entiate between pathogenic and neutral nsSNPs (non-synony- 
mous Single Nucleotide Polymorphisms) using computational 
approaches could significantly aid in targeting the disease 
associated mutations. In last two decades, extensive computa- 
tional efforts had been provided to study the functional and 
structural consequences of the SNPs [1]. The primary efforts to 
recognize the patterns of SNPs taking place in the coding 
regions of genes were studied by Cargill et al. [2] and Halushka 
et al. [3]. Cargill et al. characterized SNPs in 106 candidate 
genes having potential relevance to various human diseases 
[2]. Several reports had very precisely explained the occur- 
rence and consequences of disease-associated alleles on the 
protein structures. Terp et al. [4] reported the structurally 
relevant features common in disease-associated mutations from 
the Human Gene Mutation Database (HGMD). Vitkup et al. 



[5] examined the frequency of mutations associated with 
diseases in detail. Furthermore, the detailed analysis of the 
biophysical and evolutionary distributions of the disease- 
associated mutations was demonstrated by Ferrer-Costa et al. 
[6]. Stitziel et al. [7] further analysed the locations of non- 
synonymous disease-associated polymorphisms by organizing 
the mutational data into structural classes. Moreover, Mooney 
et al. [8] demonstrated the majority of occurrence of disease- 
associated mutations in the highly conserved genomic loca- 
tions. Furthermore, Saunders and Baker [9] applied decision 
trees and a linear logistic regression to find that a protein 
structure-derived solvent accessibility term {CP density) and an 
evolutionary term derived from a PSSM matrix (SIFT) to 
examine different features associated with intolerant muta- 
tions. Moreover, Krishnan and Westhead [10] applied decision 
tree and support vector machine (SVM) techniques to the in 
vitro mutagenesis data sets as well as to SNPs in the nematode 
worm species Caenorhabditis elegans to examine their effective- 
ness in SNP characterization. These efforts were further 
accompanied by several other researches that demonstrated 
the effective use of computational tools and algorithms for 
accurate characterization of SNPs [1]. Path to deleterious SNP 
prediction has now advanced from its previous form. Apart 
from reporting the deleterious nsSNPs, now the calculation of 
phenotypic changes had further raised the prediction accuracy 
level. The previous computational SNP analysis pathways lack 
the genotype-phenotype correlations which were considered as 
the major disadvantage of the result. To overcome these 



PLOS Computational Biology | www.ploscompbiol.org 



1 



April 2014 | Volume 10 | Issue 4 | e1003318 



Prediction of Cancer Associated SNPs 



Author Summary 

The genetic mutations in human were shown by single 
nucleotide polymorphisms (SNPs), and several of them 
were supposed to cause deleterious and disease associat- 
ed phenotypic effects. The ability to differentiate between 
pathogenic and neutral nsSNPs (non-synonymous Single 
Nucleotide Polymorphisms) using computational ap- 
proaches could significantly help in targeting the disease 
associated mutations. In this work we applied computa- 
tional tools to prioritize the most deleterious disease 
associated mutation in Aurora kinases. We used sequence 
and structural based approaches to refine cancer associ- 
ated mutation and applied a long term simulation (MDS) 
to understand the changes in structural conformation and 
function of the aurora kinases upon mutation. From the 
200 ns MD simulation, we suggested that the G325W 
mutation induced major phenotypic damages in aurora-A 
kinase protein and altered its structural behaviour in 3D 
space, which might play an important role in inducing 
hepatocellular carcinoma. 

limitations, our lab studied the structural consequences of 
deleterious predicted mutations. Changes in amino-acid 
composition had wide role in affecting the native conformation 
of the protein structure. 

The vibrant nature of proteins was closely linked to its 
function, with key conformational changes usually taking place 
on timescales, varying from microseconds to several seconds. 
At a scrupulous time step, protein attains particular confor- 
mation that occupies a least amount on its free-energy 
landscape. When energy fluctuates from one minimum to 
another, it leads to alterations in the protein structure which 
controls their structural fluctuations and function. Thus to 
examine these conformational alteration occurring due to 
predicted deleterious alleles, we further incorporated molecu- 
lar dynamics calculations for native and mutant protein 
structures. Molecular dynamics (MD) simulation is an absolute 
method for modelling protein dynamic motions, the classifica- 
tion of which gives perception into the workings of macromo- 
lecular systems at spatial and temporal scales that are hard to 
access empirically. The use of molecular dynamics approach 
had been a leading platform to determine the phenotypic 
consequences induced by point mutations [11-14]. Our 
previous results had shown high accuracy of deleterious nsSNP 
prediction in CENPE and MCAK protein coding genes when 
MDS (Molecular Dynamics Simulation) was incorporated 



[15,16]. On the technical side, depending on the size of the 
protein, the simulation might take a very long time to converge 
(e.g. a hundred nanoseconds). Thus, even the most 'state-of- 
the' art or computationally intensive methods might not give 
correct results. Moreover, folding was typically much slower 
than what presently simulated through direct application of all- 
atom MD simulations. Most simulation studies of protein 
folding had relied on reduced models, implicit solvent models, 
protein unfolding or the inclusion of experimental data into the 
simulations [17]. Extended MD simulations were liable to play 
a vital role in modelling the long-timescale native and mutant 
state dynamics and their association with protein function. 
Thus, we carried out a long term molecular dynamics 
simulation to aid the computational SNP prediction results. 
Aurora kinases were chosen to detect the cancer associated 
SNPs due to the recent cases showing their involvement in 
cancer associated pathways. 

Materials and Methods 

Dataset collection 

Human Aurora kinase protein sequence data was collected form 
national centre for biological Information (NCBI) protein 
sequence database [18]. SNP information for our computational 
analysis was obtained from NCBI dbSNP (http://www.ncbi.nlm. 
nih.gov/snp/) [19]. Structure of Aurora-A kinase was obtained 
from Brookhaven Protein Data Bank [20] (PDB ID: 1MQ4). The 
mutant structure was build using homology modelling technique 
through Modeller9v9 package [21]. Modelled structures were 
refined by means of loop refining, checking wrong bond contacts 
and adding hydrogen atoms. The best selected structure was 
energy minimized by charmm27 force field for 5000 iterations 
using Gromacs 4.5.3 package [22]. 

Disease-associated SNP prediction 

There are likely chances of amino acid variants to cause 
pathological phenotype, which might directly or indirectly lead to 
cancers. We used SIFT [23], Polyphen-2 [24], PhD-SNP [25], 
Pmut [26], MutPred [27], Dr Cancer [28], Fathmm [29] and SNP 
Function Portal [30] to detect the amino acid variants which are 
likely to cause cancer. SIFT and Polyphen-2 is used to filter 
deleterious and harmful SNPs from the large pool of SNP dataset. 
Moreover, PhD-SNP and Pmut was used to filter disease 
associated SNPs. Furthermore, Dr. Cancer, Fathmm and SNP 
Function Portal was used to predict the cancer associated SNPs. 
MutPred was used to examine the molecular changes occurring in 
the protein induced by a particular amino acid variant. 




Figure 1. Location of the mutant residue, a) Native Aurora-A protein b) Mutant Aurora-A protein. Tlie mutant location is shown in red. 
doi:1 0.1 371 /journal.pcbi.1 00331 S.gOOl 



PLOS Computational Biology | www.ploscompbiol.org 



2 



April 2014 | Volume 10 | Issue 4 | e1003318 



Prediction of Cancer Associated SNPs 



Table 1. Deleterious and damaging nsSNPs prioritized using SIFT tolerance score and Polyphen-2 PSIC score. 



Gene 


SNP ID 


AA change 


SIFT 




Polyphen-2 








Tolerance Index 


Prediction 


PSIC 


PkoH lotion 


AurkA 


rsl 13451 501 


N395S 


0.94 


Tolerated 


0.000 


Benign 




rs1 46034224 


P381L 


0.00 


Deleterious 


0.994 


Damaging 




rs33923703 


M373V 


0.01 


Deleterious 


0.000 


Benign 




rsl 92771 776 


H366Y 


0.50 


Tolerated 


0.019 


Benign 




rs201 356620 


E354Q 


0.12 


Tolerated 


0.699 


Damaging 




rs45557632 


F348L 


0.01 


Deleterious 


0.353 


Benign 




rs200181472 


R343Q 


0.15 


Tolerated 


0.063 


Benign 




rsn539196 


G325W 


0.00 


Deleterious 


1.000 


Damaging 




rs45483697 


G198S 


0.00 


Deleterious 


1.000 


Damaging 




rs45520831 


R179K 


0.03 


Deleterious 


0.903 


Damaging 




rs45455492 


L114Q 


0.66 


Tolerated 


0.001 


Benign 




rs2230743 


S104L 


0.30 


Tolerated 


0.005 


Benign 




rs45533839 


V84A 


0.76 


Tolerated 


0.000 


Benign 




rsl 047972 


157V 


1.00 


Tolerated 


0.000 


Benign 




rs34572020 


P50L 


0.27 


Tolerated 


0.027 


Benign 




rsl 4561 6804 


Q34H 


0.43 


Tolerated 


0.975 


Damaging 




rs1 88825988 


R24C 


0.00 


Deleterious 


1.000 


Damaging 




rs6069717 


G11R 


0.87 


Tolerated 


0.003 


Benign 


AurkB 


rs201 709756 


A344T 


0.07 


Tolerated 


0.012 


Benign 




rsl 487 15809 


P325S 


0.03 


Deleterious 


0.933 


Damaging 




rsl 441 69786 


S313L 


0.74 


Tolerated 


0.068 


Benign 




rsl 4601 7427 


S313P 


0.30 


Tolerated 


0.019 


Benign 




rsl 46334050 


I304T 


0.00 


Deleterious 


0.989 


Damaging 




rsl 059476 


M298T 


0.42 


Tolerated 


0.000 


Benign 




rsl 51 173438 


A294T 


0.26 


Tolerated 


0.006 


Benign 




rs1 40224531 


R284C 


0.00 


Deleterious 


0.998 


Damaging 




rsl 49651 741 


G212R 


0.00 


Deleterious 


0.945 


Damaging 




rsl 4690571 3 


E204D 


0.00 


Deleterious 


0.975 


Damaging 




rs55871613 


T179M 


0.02 


Deleterious 


0.989 


Damaging 




rsl 99630207 


E174K 


0.00 


Deleterious 


0.970 


Damaging 




rsl 47097910 


A157T 


0.00 


Deleterious 


1.000 


Damaging 




rsl 481 33660 


R147W 


0.01 


Deleterious 


0.121 


Benign 




rsl 5021 6235 


V103IV1 


0.04 


Deleterious 


0.997 


Damaging 




rs3027254 


H100Q 


0.13 


Tolerated 


0.001 


Benign 




rsl 8471 3921 


R95Q 


0.00 


Deleterious 


1.000 


Damaging 




rsl 41 907099 


R70W 


0.00 


Deleterious 


0.292 


Benign 




rsl 46036524 


T69M 


0.14 


Tolerated 


0.000 


Benign 




rsl 99981 964 


S61G 


0.45 


Tolerated 


0.001 


Benign 




rs74385486 


M58I 


0.19 


Tolerated 


0.000 


Benign 




rs55878091 


A52V 


0.33 


Tolerated 


0.000 


Benign 




rs201438176 


R44H 


0.05 


Tolerated 


0.886 


Damaging 


AurkC 


rs200296015 


MIT 


0.05 


Deleterious 


0.012 


Benign 




rs200712786 


S12N 


0.43 


Tolerated 


0.003 


Benign 




rsl 461 86252 


R17Q 


0.06 


Tolerated 


0.205 


Benign 




rsl 37858773 


R28H 


0.00 


Deleterious 


0.603 


Damaging 




rs61 736320 


160V 


0.03 


Deleterious 


0.001 


Benign 




rsl 48631 645 


I60T 


0.00 


Deleterious 


0.617 


Damaging 




rs2020301 66 


N83H 


0.20 


Tolerated 


0.955 


Damaging 
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Table 1. Cont. 





Gene 


SNP ID 


AA change 


SIFT 




Polyphen-2 








Tolerance Index 


Prediction 


PSIC 


r 1 CvllCilUII 




rs45555141 


R86H 


0.00 


Deleterious 


0.862 


Damaging 




rs45623632 


Y90C 


0.00 


Deleterious 


0.983 


Damaging 




rsl 998551 50 


H92R 


0.01 


Deleterious 


0.928 


Damaging 




rs45503793 


T126IV1 


0.02 


Deleterious 


0.996 


Damaging 




rsl 47955649 


C138F 


0.00 


Deleterious 


1.000 


Damaging 




rs200042694 


K178N 


0.58 


Tolerated 


0.001 


Benign 




rsl 99933542 


D202Y 


0.00 


Deleterious 


1.000 


Damaging 




rsl 491 57434 


S222N 


0.08 


Tolerated 


0.089 


Benign 




rs45527835 


D236Y 


0.03 


Deleterious 


0.889 


Damaging 




rs147392532 


R238K 


1.00 


Tolerated 


0.000 


Benign 




rs20 1199082 


Q267E 


0.93 


Tolerated 


0.001 


Benign 




rs200445619 


Q267R 


0.54 


Tolerated 


0.006 


Benign 



The alleles highlighted in bold are predicted to be deleterious as well as damaging by SIFT and Polyphen scores. 
doi:1 0.1 371/journal.pcbi.1 00331 8.t001 



Phenotype analysis 

SNPefFect4.0 tool was used to detect the phenotypic changes 
induced by the computationally predicted cancer associated 
mutation [31]. It reports the changes in TANGO, LIMBO, 
WALTZ and ddG scores for the given mutation. Furthermore, the 
meta-analysis tool of SNPeffect4.0 used to map these changes for 
different cancer type. 



Cation-Ti sites and stabilizing residues 

Cation-Ti interactions were calculated by using CaPTURE 
program [32]. The CaPTURE program identifies energetically 
significant cation-Ti interactions within proteins in the Protein 
Data Bank (PDB). Cation-7i interactions in Aurora_A kinase 
structures are evaluated with default parameters which were 
mentioned in our previous work [33]. 



Native 



Mutant 



I 1 1 1 1 1 I 1 1 1 1 1 

0 50 100 150 200 250 0 50 100 150 200 250 

Position 



Native - Mutant 



ir 




0 50 100 150 200 250 LIMBO c Ha pero ne^bi nci i nQ sites 

Position 

Figure 2. Profile representation of the LIMBO stretches in a) Native and b) Mutant protein, c) Difference in LIMBO chaperone binding 
propensity between native and mutant, d) Molecular visualisation of LIMBO chaperone-binding sites as pink colored segments. 
doi:1 0.1 371/journal.pcbi.1 00331 8.g002 
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Table 2. LIMBO regions in mutant and wild type. 





Number 


Start 


End 


Stretch 


Score 


Native 


1 


30 


44 


KFILALKVLFKAQL 


12.87 


2 


67 


75 


ILRLYGYF 


37.58 


3 


80 


89 


VYLILEYAP 


40.52 


4 


136 


144 


LLLGSAGE 


24.91 


5 


168 


176 


DYLPPEMI 


42.67 


6 


196 


204 


FLVGKPPF 


5.66 


7 


236 


244 


RLLKHNPS 


94.00 


Mutant 


1 


30 


44 


KFILALKVLFKAQL 


12.87 


2 


67 


75 


ILRLYGYF 


37.58 


3 


80 


89 


VYLILEYAP 


40.52 


4 


136 


144 


LLLGSAGE 


24.91 


5 










6 


195 


203 


EFLVWKPP 


82.59 


7 


236 


244 


RLLKHNPS 


16.53 



For each LIMBO region, the start, end, sequence and score is given. Stretches shown in bold represents the LIMBO regions that have been severely affected by the 
mutation. 

doi:1 0.1 371/journal.pcbi.1 00331 8.t002 



Identifying stabilizing residues 

SRIDE tool was used to identify the stabilizing residues in 
Aurora-A protein [16,34]. The prediction is done by calculating 4 
essential criteria that involves surrounding hydrophobicity of a 
residue, the long-range order, stabilization centre defined by 
considering the conservation scores of residues and the contact 
map of a protein [16]. Two residues are in contact if there is at 
least one pair of heavy atoms with a distance less than the sum of 
the van der Waals radii of the two atoms plus 1.0 A [16]. A 
contact is measured long-range if it is between residues that are 
separated by at least 10 residues in the protein sequence [16]. 
Two residues are stabilizing centers (SC) elements if they are 
involved in long-range contacts and if at least one supporting 
residue can be found in each of the flanking tetra-peptides of 
these residues, in such a way that at least seven out of the possible 
nine interactions are formed between the two triplets [16]. 
Gombinedly, all these criteria are compiled to report the 
stabilizing residues [16]. 

Molecular docking analysis 

We used molecular docking analysis to carry comparative 
analysis between TPX2 binding affinity of native and mutant 
Aurora A protein using HADDOCK tool [35,36]. HAD- 
DOCK is information driven flexible docking approach to 
perform the molecular docking simulation and it runs a series 
of scripts in combination with ARIA [37,38] and CNS [39]. 
The interacting active residues of Aurora-A and TPX2 for 
docking simulations were obtained from our previous work 
[40]. 

Molecular dynamics simulation 

Molecular Dynamics Simulation was carried out by using 
Gromacs 4.5.3 package [22]. Structure of native and mutant 
Aurora-A kinase was used as starting point for MD simula- 
tions. The simulation parameters were set according to our 
previous work conducted on Aurora-A protein [40] and other 



proteins [15,32,40]. Systems were solvated in a rectangular 
box with TIP3P water molecules at 10 A marginal radius. The 
systems were neutralized by added 3 sodium ions (Na^) to the 
simulation box using the "genion" tool that accompanies with 
Gromacs package. Energy minimization was performed for 
5000 iterations by conjugate gradient method implementing 
GROMOS96 43a 1 force field. Emtol convergence criterion 
was set to 1000 kj/mol/nm. Berendsen temperature coupling 
method [41] was used to control the temperature within the 
box. Electrostatic interactions were calculated using the 
Particle Mesh Ewald method [42]. The systems were subjected 
to position restraint simulation for 5 ns and after that 
unrestraint simulation for 200 ns. We then computed the 
comparative analysis of structural deviations in native and 
mutant structure. RMSD, RMSF, SAS and Rg analysis were 
carried out by using g_rms, g_rmsf, g_sas and g_gyrate tool 
respectively. Number of distinct hydrogen bonds (NHbonds) 
was calculated using g_hbond. Moreover, we implemented 
g_densmap to obtain the atomic density distribution of the 
native and mutant protein. All graphs were plotted using 
Grace GUI toolkit 5.1.22 version. Further we carried out 
principal component analysis using essential dynamics (ED) 
method according to protocol [43] within the Gromacs 
software package. This section is an abbreviated version of 
our previously published work [15,32,40]. 

Results/Discussion 

Total 60 nsSNPs were computationally examined to detect their 
harmful and damaging properties. Out of 60 SNPs, 24 were 
calculated to be deleterious as well as damaging using SIFT and 
Polyphen2 servers (Table 1). Among these 24 nsSNPs, 7 were 
accounted to be exceedingly deleterious and damaging with 
tolerance index of zero and PSIG score 1 . To further categorize 
the expected deleterious nsSNP's as cancer-associated, we used 
PhD-SNP, Pmut, MutPred, Dr Cancer, Fathmm and SNP 
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1 1 

polymorphism unclassified 

MutationTypes 



■ disease (N=4) 
o polymorphism (N=8) 
unclassified (N=3) 




I II II I I II I I I ! I I I I I I I I I I I M I i I I II I I I I I II I I I I I I I I I M 
-10.0 -8.0 -5.9 -3.9 -1.8 0.2 1.8 3.5 5.1 6.7 8.4 

Difference in Stability 



disease (N=4) 
polymorphism (N=8) 
unclassified (N=:3) 




I I I I I II I II I II miH I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M ' 
-1000 -755 -551 -347 -143 20 184 388 592 796 1000 

Difference in Limbo score 



Figure 3. Representation of dLIMBO and ddG scores for the mutations associated with Hepatocellular carcinoma, a) Difference in 
stability scores b) Difference in stability scores with respect to its frequency of occurrence, c) Difference in LIMBO scores with repect to its frequency 
of occurrence. 

doi:l 0.1 371 /journal.pcbi.l 00331 8.g003 



Function Portal. Combinedly we reported Aurora-A G325W 
mutation as cancer-associated (Table SI). A very interesting 
scenario was observed in the results obtained from SNP Function 
Portal. Aurora-A G325W mutation was predicted to be associated 
with multiple cancer cases including Neurofibrosarcoma, Prostate 
cancer, Colorectal cancer. Breast cancer, Li-Fraumeni syndrome. 
Osteosarcoma, Hepatocellular carcinoma. Histiocytoma, Naso- 
pharyngeal carcinoma. Thyroid carcinoma. Pancreatic cancer. 
Breast cancer. Adrenal cortical carcinoma. Gastric cancer. 
Endometrial carcinoma. Colorectal cancer. Lung and Ovarian 
Adenocarcinoma. This large number of cancer-associated predic- 
tions directly suggested that the G325W Aurora-A kinase mutation 
was likely to be associated with cancer cases. Fig. 1 showed the 
position of mutant residue in Aurora-A kinase protein. 

Without studying the phenotypic evidences associated with a 
particular mutation, the conclusion derived from computational 
studies are not easily reliable. Thus we implemented SNPEfrect4.0 
to examine the phenotypic consequences of G325W mutation at 



molecular level. Significant fluctuation in the LIMBO score for 
G325W mutation was observed. For G325W mutation, dLIMBO 
equaled —335.67 which meant that the mutation decreased the 
chaperone binding propensity of the protein. Fig. 2 represents the 
position of the chaperone-binding sites in the protein. In Fig 2, it 
could be clearly seen that the mutation had shifted the chaperone- 
binding sites from the 7^^^ stretch to the 6* stretch. In table 2, we 
could see that the 6* stretch had shown significant rise in the 
chaperone-binding tendency where in native, the LIMBO score of 
6* stretch was 5.66 whereas in mutant the LIMBO score 
increased to 82.59. Furthermore the 7* stretch showed significant 
loss in the chaperone-binding tendency in mutant when compared 
to the native. Moreover, the 5* chaperone-binding stretch was 
completely absent in the mutant protein, whereas it was shown to 
be active in native Aurora-A kinase. Molecular chaperones are 
protein molecules that support in the non-covalent folding or 
unfolding and the assembly or disassembly of other biomolecular 
structures. Furthermore, it disallowed both newly synthesised 
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Figure 4. Backbone RMSDs are shown as a function of time for 
native and mutant Aurora-A protein structures at 300 K. Native 
is shown in black and mutant in red. 
doi:1 0.1 371/journal.pcbi.1 00331 8.g004 

polypeptide chains and assembled subunits from aggregating into 
non-functional structures. Decrease in chaperone-binding tenden- 
cy of Aurora-A might severely damage the folding pattern of the 
protein. Moreover, it would lead to the formation of non- 
functional protein aggregates which might cause disease-associated 
phenomena. Furthermore, major stability loss was observed in the 
case of mutant when FoldX predictions from SNPEffect4.0 were 
taken into account. The mutant showed rise in ddG value of 
7.28 kcal/mol. This implied that the mutation severely reduced 
the protein stability. Based on the dLIMBO and ddG scores it was 
evident that the mutation had induced loss in the chaperone- 
binding tendency of protein which further led to severe 
conformational changes in the protein motif and reduced its 
stability. We subjected these scores for the meta-analysis in order 
to detect the type of cancer which was most likely to occur in the 
case of G325A Aurora-A kinase mutation. The dLIMBO and ddG 
scores were plotted for previously reported cancer cases. Our 
dLIMBO and ddG scores were shown to lie in the dLIMBO and 
ddG distribution range of hepatocellular carcinoma associated 
mutations (Fig. 3). For other cancer cases, no positive distributions 
for the above obtained scores were observed. This showed that the 
G325W mutation might play an important role in inducing 
hepatocellular carcinoma associated phenotypes, whereas no 
significant correlation was observed for other cancer cases. 
Other scores obtained from SNPEffect4.0 did not have much 
significance. 

By the interaction with TPX2, the catalytic activity of Aurora-A 
protein increased up to 1 5-fold [44] . TPX2 bound to and localized 
Aurora-A to spindle microtubules along with microtubules on the 
periphery of spindle poles [45]. To examine the changes in 
interaction pattern of mutant Aurora-A with TPX2 when 
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4(X) 



Figure 5. RMSF of the backbone CAs of Ca atoms of native and 
mutant Aurora-A protein versus time at 300 K. Native is shown in 
black and mutant in red. 
doi:1 0.1 371/journal.pcbi.1 00331 8.g005 



7 



April 2014 | Volume 10 | Issue 4 | e1003318 



Prediction of Cancer Associated SNPs 




Time (ps) 



Figure 6. Radius of gyration of Ca atoms of native and mutant 
Aurora-A protein versus time at 300 K. Native is shown in black 
and mutant in red. 

doi:1 0.1 371 /journal.pcbi.1 00331 8.g006 
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Figure 7. Solvent-accessible surface area (SASA) of native and 
mutant Aurora-A protein versus time at 300 K. Native is shown in 
black and mutant in red. 
doi:1 0.1 371/journal.pcbi.1 00331 8.g007 



compared to the native, we conducted molecular docking analysis. 
In-depth analysis of the docked complexes revealed notable 
differences between interaction patterns of native and mutant 
Aurora-A protein with TPX2. As we described in [14], 
computation of interaction energy was very important to 
comprehend the attraction level of biological partners. Overall 
interaction energy amid Aurora-A (native /mutant) and TPX2 
protein mainly contributed to their electrostatic and van der 
Waals interaction energy. In native, there were a major 
contribution of electrostatic and van der Waals energy of — 
97.7+/-24.1 kcal/mol and -41.0+/-5.6 kcal/mol respectively 
(Table 3). The total protein-protein interaction energy of native 
complex was — 153. 17 kcal/mol (Table 3). All the results 
obtained from docking analysis were confirmed from the results 
obtained in our previous work on Aurora-A kinase [40]. 
Electrostatic and van der Waals energy confirms large amount 
of surface complementarities amid native Aurora-A and TPX2. 
On the contrary, mutant complex showed less van der Waals and 
electrostatic energies of -17.9+/ -2.1 and -59.3+/- 14.8 kcal/ 
mol, respectively, and relatively less amount of total interaction 
energy —92.96 kcal/mol when compared to the native (Table 3). 
More negative value of interaction energy for native as compared 
to mutant indicated its high aflfmity towards TPX2. Approxima- 
tions of van der Waals interaction energy were computed to 
provide a theoretical quantitative estimation of the protein- 
protein non-bonded interactions [14]. Result showed that the 
mutation induced damaging consequences in native conforma- 
tion of Aurora-A kinase that in turn reduced its overall binding 
aflfmity to TPX2 protein. 

As we described in [14], buried surface area (BSA) is a criterion 
to assess protein surface which is not exposed to water and more 
BSA showed a compact macromolecular complex. In table 3, we 
have showed the BSA of native and mutant complexes as 
120 1.6+/ -32.9 and 758.5+/ -28.8 respectively. Consideration 
of desolvation component (the loss of interactions with the water 
phase) was important because it overcompensated the interaction 
energy and resulted in an opposite effect [14,20]. Restraints 
violation energy, desolvation energy and BSA showed good 
correlation with interaction energy and docking score of complex 
during docking simulation (Table 3). Major difference between 
docking score, interaction energy and BSA were found in mutant 
than native. The net values of native were significantly higher than 
mutant, indicating the loss-of-function due to mutation in Aurora 
A which disrupted the binding with TPX2 when compared to 
mutant (Table 3). Docking process solely depended on the position 
of binding residues (active residues) on the protein surface. It 



showed that divergence from the actual location of binding 
residues (position number 155, 157, 159 and 197) in mutant 
Aurora-A might be a reason for the loss of binding with TPX2. 
The three factors namely, score difference in docking process, 
alteration in interaction energy and in buried surface area of 
complex possibly corresponded to conformational variation of 
protein-binding surface due to mutation. 

The ddG scores obtained from SNPeflfect4.0 FoldX prediction 
implicated loss in the stability of protein. Thus we implemented 
MDS calculation to observe the molecular alterations associated 
with the stability loss. We explored RMSD, Rg, RMSF, SASA, 
NHbond, total energy and density distribution of mutant and 
native structure. RMSD for all the Ccx atoms were calculated from 
the starting structure which was described as the central origin to 
compute the protein system (Fig. 4) [46]. In Fig. 4, native and 
mutant protein showed diverge fashion of deviation from starting, 
resulting in final backbone RMSD of —0.172 nm and 0.252 nm 
respectively during the simulation. Mutant lingered distinguished 
till the end and exhibited higher aberration than native. This level 
of fluctuation together with a minute difference in average RMSD 
value after the relaxation time led to the outcome that the 
simulation produced stable trajectory, thus providing a suitable 
basis for further investigation. To observe the affect of mutation on 
the dynamic behaviour of residues, the RMSF values of mutant 
and native backbone residues were computed (Fig. 5). Inspection 
of fluctuation score exposed the occurrence of higher degree of 
flexibility in mutant. Mutant curve differed significantly and 
fluctuated at higher degree during the simulation time period, 
indicating that mutant conformation was flexible throughout the 




:e+05 



Figure 8. Average number of protein-solvent intermolecular 
hydrogen bonds in native and mutant Aurora-A protein versus 
time at 300 K. Native is shown in black and mutant in red. 
doi:1 0.1 371/joumal.pcbi.1 00331 8.g008 
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Figure 9. Graph of total energy as a function of time for mutant 
and native Aurora A protein versus time at 300 K. Mutant is 
shown in red and native in black. 
doi:1 0.1 371 /journal.pcbi.1 00331 8.g009 

simulation time and its structure acquired expanded conformation 
when compared to native. 

As we described in [46] , the radius of gyration (Rg) is the mass- 
weighted root mean square distance of group of atoms from their 
common centre of mass. Hence it provided an observation into 
global dimension of protein. Radius of gyration graph for alpha- 
carbon atoms of protein vs time at 300 K is depicted in Fig. 6. We 
observed a major fluctuation in both native and mutant between a 
time periods of 0 to 200 ns (Fig. 6). Based on Rg graph, native 
structure was found stable than mutant. The alteration in SASA of 
mutant and native protein with time is depicted in Fig. 7. Mutant 
protein showed higher value of SASA with time, while native 
indicated lower SASA value. Longer variation in Rg plot showed 
that the mutant protein might be enduring a major structural 
transition. This was further maintained by SASA result in which 
the mutant was found to reveal higher values (Fig. 7). Hydrogen 
bond plays a major factor on controlling the steady conformation 
of protein. Time dependent NHbond formations during simula- 
tion were observed in native and mutant proteins in order to 
comprehend the relationship between hydrogen bond formation 
and flexibility. Mutant structure demonstrated significantly less 
number of NHbond formation throughout the simulation when 
compared to native (Fig. 8). Furthermore, the energy spectrum 
provided a detail insight into the stability of a particular structure. 
We investigated the overall energy variation of native and mutant 
protein throughout the simulation. A significant rise in energy level 
of mutant structure was observed (Fig. 9). It provided a clear 
indication of loss of the stability in mutant structure. Moreover, the 
observation of corresponding eigenvalues showed the level of 
variation and dynamic nature of protein molecule in the 
simulation system and was mostly restricted within the first two 
eigenvectors. Our result showed higher range of eigenvector 
trajectory covered by mutant in comparison with native (Fig. 10). 

Further information on the structural plasticity of protein was 
found by the analysis of time-dependent secondary structures and 
structural fluctuations. Fig. 1 1 showed the secondary structural 
elements as a function of simulation time. Coils, (3-sheets, bends 
and turns were found in both native and mutant protein during 
simulation time period and it is depicted in fig 11. In native 
Aurora-A kinase, the residues between position 238-275 appeared 
as alpha-helix, (3-sheets and coils with few bend conformation till 
the end of the simulation but in mutant Aurora-A this region 
showed beta-sheets and turns with more traces of bend confor- 
mation. The severe conformational changes in this region 
supported our previous results obtained from SNPEffect4.0, where 
the loss in chaperone-binding tendency of this region was 
observed. In Comparison to native Aurora-A kinase, mutant 



10 




projection on eigenvector 1 (nm) 

Figure 10. Projection of the motion of the protein in phase 
space along the first two principal eigenvectors at 300 K. Native 
is shown in black and mutant in red. 
doi:1 0.1 371/joumal.pcbi.1 00331 8.g01 0 

showed significant structural changes between a region of residues 
238-275, 324-334 and 348-360 during simulation. Between 
residues 324-334, native showed more (X-helix whereas in mutant 
it attained bend with few traces of turns. Between the residues of 
348-360, (X-helix and coil conformation formed the majority of 
structure whereas in mutant, the coil conformation was only seen. 
In other regions, very slight changes in secondary structure 
conformation were observed. 

In SNPeffect4.0 results, we observed that the chaperone-binding 
location shifted from the 7* stretch to the 6* stretch. Moreover, 
the chaperone-binding activity of 5* stretch was shown to be 
completely lost. Thus we plotted the atom density distribution to 
check if these shifting had caused any major changes in the 
orientation and atomic distribution of 7* and 5* chaperone- 
binding stretches. The plot indicated major conformation loss in 
mutant as its atomic distribution differed significantly from the 
native. For the 5* stretch the mutant structure confirmed a highest 
atomic density of 6.24 nm ^ whereas in native it was 7.76 nm ^ 
(Fig. 12). Moreover, for the 7* stretch, the mutant structure 
exhibited a highest atomic density of 7.23 nm~^ whereas in native 
it was 10.3 nm ^ (Fig 13), which was in accordance to the DSSP 
results, further indicating major conformational loss in these 
segments. 

The above results strongly indicated the loss of stability of 
the mutant structure. To examine the molecular causes 
associated with these outcomes, we further investigated the 
cation- 71 interactions and the stabilizing residues in Aurora A 
protein. Cation-Ti interaction has significant contribution in 
protein stability. In proteins, cation-Ti interactions occur 
between the aromatic side chains of phenylalanine (F), tyrosine 
(Y), and tryptophan (W) and the cationic side chain of lysine 
(K) or arginine (R). It was likely to have a important role in the 
folding or stabilization of the proteins. The significance of 
cation-7i interaction had been observed in several researches 
for their corresponding role in retaining the stability of 
proteins. We observed a total of 5 energetically significant 
cation-Ti interactions (Fig 14). To scrutinize if the cation-Ti 
interactions were maintained in mutant structure, we further 
examined the bond length variations of the outlying cation-Ti 
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Figure 11. Time evolution of the secondary structural elements of the protein at 300 k (DSSP classification), a) Native and Mutant 
Aurora-A protein merged at different timescale. b) Native Aurora-A secondary structural variation c) Mutant secondary structural fluctuation. 
doi:1 0.1 371 /journal.pcbi.1 00331 8.g01 1 



interaction. The bond length fluctuation showed that the 
Arg232-Phe238, Lys240-Tyr256 and Arg357-Phe348 cation-Ti 
interactions had undergone major interaction losses (Fig 14). 
After 32900 ps there was a major loss in Arg232-Phe238 
cation-TT interaction whereas after 28400 ps there was a 
significant loss in Arg357-Phe348 interactions (Fig 14). Mutant 
trajectory of Arg37 1-Trp3 1 3 bond length was comparatively 
higher when compared to the native and maintained a 
constant trajectory during the simulation. Arg232-Phe238 
and Lys250-Tyr246 interaction were retained in the mutant 
structure. These results suggested that there was a major loss in 
cation-7i interactions in the mutant structure which might 
have induced higher fluctuation and lower in stability of the 
structure. Furthermore, the protein structure contains certain 



SG that acted as an essential component to maintain the stable 
conformation of corresponding protein structure. Residues 
AlaieO, Leuiei, Lysl62, Vall63, Asnl92, Leu210, Leu263, 
Lys271, Ile272 and Ser278 were identified as stabilizing 
centers. When the RMSF of these residues were investigated, 
we observed significant rise in their fluctuation levels (Table 4). 
This further indicated that apart from damaging the ATP 
binding domain and the cation-7i interactions, the mutation 
had also introduced major deflection in the Aurora A protein 
SC, which might have significantly contributed towards 
phenotypic changes observed in mutant G325W structure. 
All these biophysical and biochemical factors obtained from 
200 ns MD simulation had collectively suggested that the 
G325W mutation had strong evidence of inducing phenotypic 



PLOS Computational Biology | www.ploscompbiol.org 



10 



April 2014 | Volume 10 | Issue 4 | e1003318 



Prediction of Cancer Associated SNPs 




damages in Aurora-A kinase protein, and might play signifi- 
cant role in inducing hepatocellular carcinoma. These results 
could be further implemented for wet lab investigations and 
could provide an evidence of Aurora kinase genetic mutation 
in association with cancer. 



Conclusion 

By the use of multiple computational platforms and long- 
term molecular dynamics simulation approaches, we identified 
Aurora-A G325W mutation as highly deleterious and associ- 
ated with hepatocellular carcinoma. The RMSF, Rg, DSSP 
and energy fluctuation results provided a clear insight of 
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Figure 14. Distance fluctuation between the cation-ji residues a) Lys227-Phe351 b) Arg232-Phe238 c) Lys240-Tyr256 d) Arg357- 

Phe348 e) Arg371-Trp313. Native is shown in black and mutant in red. 

doi:10.1371/journal.pcbi.1003318.g014 



Table 4. The RMSF values of stabilizing residues. 



Serial Number 


Residue 


HP 


LRO 


RMSF (nm) 












Native 


Mutant 


1 


ALA160 


27.54 


0.02299 


0.0589 


0.0736 


2 


LEU161 


24.29 


0.02299 


0.0528 


0.0657 


3 


LYS162 


21.97 


0.03065 


0.0586 


0.0672 


4 


VAL163 


24.01 


0.03448 


0.0669 


0.0849 


5 


ASN192 


20.39 


0.03065 


0.0575 


0.0707 


6 


LEU210 


23.47 


0.03065 


0.0497 


0.0664 


7 


LEU263 


23.05 


0.02299 


0.0545 


0.07 


8 


LYS271 


24.49 


0.02299 


0.0589 


0.0717 


9 


ILE272 


21.37 


0.02299 


0.0507 


0.0697 


10 


SER278 


21.55 


0.03448 


0.0518 


0.0753 



doi:1 0.1 371/journal.pcbi.1 00331 8.t004 
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stability loss and conformational changes in mutant protein. 
The results showed that the long-range computational simu- 
lations provided clear picture of conformational changes 
occurring in the computationally predicted disease associated 
mutant protein structure. Through the advancements in the 
computational resources, the use of MDS for higher time scale 
would become approachable. Dedicated computer for 
biomolecular simulation will most likely bring further 
speedups in future, allowing the length of each simulation to 
move well ahead of the large timescale. These advancements 
will aid in studying the changes in the dynamics behavior 
of computationally predicted disease-associated SNPs. More- 
over, the advancements in high-performance molecular 
dynamics simulation of biomolecules will provide new pros- 
pects for the use of in silico simulations to observe mutation- 
induced changes in biophysicochemical properties of protein, 
especially for those parameters, which were well-known to 
occur at long timescales. Furthermore to the capacity to 
produce very long-range incessant simulation trajectories, 
these advances will also greatly enhance the accuracy of 
SNP predictions, ultimately allowing us to examine the 
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